Blang workshop

Alexandre Bouchard-Côté

11/13/2019

Logistics

Other ways to run Blang (can help setup after the workshop)

Plan: several examples of Bayesian data science using Blang

Please interrupt me early and ofter for questions!

Bayesian analysis: pros and cons

Bayesian statistics in the 20th century

If you learned Bayesian analysis from typical stats curriculum you may have left with a bad impression of Bayesian statistics

Conundrum

Bayesian statistics in the 21st century: PPL to the rescue!

Probabilistic Programming Languages (PPL): automating \[\text{Probability model + Data} \xrightarrow{\text{PPL}} \text{Posterior distribution} \]

Why this is revolutionary: potential to build quirky Bayesian methods adapted to the inferential problem (instead of adapting the problem to the method as often done nowadays)

Danger: increasing focus on PPLs such as Stan specializing on “regular data” (e.g. assuming a fixed set of built-in types for latent variables, real and/or integer)

Blang: a PPL for quirky data

Enabling technology: methods based on distribution continua (non-reversible parallel tempering and sequential change of measure)

Introductory example: rocket launches

drawing

Paradox?

Uncertainty estimates

Bayesian inference: high-level picture

  1. Construct a probability model including
    • random variables for what we will measure/observe
    • random variables for the unknown quantities
      • those we are interested in (“parameters”, “predictions”)
      • others that just help us formulate the problem (“nuisance”, “random effects”).
  2. Compute the posterior distribution conditionally on the actual data at hand
  3. Use the posterior distribution to:
    • make prediction (point estimate)
    • estimate uncertainty (credible intervals)
    • choose an action (decision theoretic framework)

Towards a model: biased coins

drawing

Making \(p\) random

drawing

Simple probability model for the rocket launch problem

drawing

Warm-up calculation

drawing

Warm-up calculation: If the first \(n=3\) flips all return heads, what is the posterior probability that the standard (\(p = 1/2\)) coin was picked?

Warm-up calculation, illustrated

drawing

So the answer is: \({\mathbb{P}}(C_1 | H_1, H_2, H_3) = \frac{1/24}{1/24 + 1/3} = 1/9 \approx 0.11\).

PPL implementation

Accessing the first example: click on Q1

model Coins {
  
  random IntVar index ?: latentInt 
  random List<IntVar> flips ?: fixedIntList(1, 1, 1)

  laws { 
  
    index ~ DiscreteUniform(0, 3)
    
    for (IntVar flip : flips) {
      flip | index ~ Bernoulli(index / 2.0)
    }
  }
}

Specifying a decision tree (probability model): the laws block

Statistical notation:

\[\begin{align} I &\sim \text{Uniform}\{0, 1, 2, \dots, (K-1), K\} \\ X_n | I &\sim \text{Bernoulli}(I/K); n \in \{1, 2, 3\} \end{align}\]

drawing

\[\begin{align}{\mathbb{P}}(X_n = x | I = i) = \begin{cases} i/K, & \text{if } x = 1 \\ 1 - i/K, & \text{if } x = 0 \\ 0, & \text{otherwise} \end{cases} \end{align}\]

Blang notation:

index ~ DiscreteUniform(0, 3)

for (IntVar flip : flips) {
  flip | index ~ Bernoulli(index / 2.0)
}
package blang.distributions

/** Uniform random variable over the contiguous set of integers \(\{m, m+1, \dots, M-1\}\). */
@Samplers(UniformSampler)
model DiscreteUniform {
  random IntVar realization
  
  /** The left point of the set (inclusive). \(m \in (-\infty, M)\) */
  param IntVar minInclusive
  
  /** The right point of the set (exclusive). \(M \in (m, \infty)\) */
  param IntVar maxExclusive
  
  laws {
    logf(minInclusive, maxExclusive) {
      if (maxExclusive - minInclusive <= 0.0) return NEGATIVE_INFINITY
      return -log(maxExclusive - minInclusive)
    }
    logf(realization, minInclusive, maxExclusive) {
      if (minInclusive <= realization &&
      realization < maxExclusive) return 0.0
      else return NEGATIVE_INFINITY
    }
  }
  
  generate(rand) {
    rand.discreteUniform(minInclusive, maxExclusive)
  }
}

Creating a model: DONE!

  1. DONE! Construct a probability model
  2. Compute the posterior distribution conditionally on the actual data at hand
  3. Use the posterior distribution to make prediction/estimate uncertainty

Setting up the posterior computation in Blang

random IntVar index ?: latentInt 
random List<IntVar> flips ?: fixedIntList(1, 1, 1)

Computing the posterior in Blang

Posterior PMF

drawing

Computing the posterior: DONE!

  1. DONE! Construct a probability model including
  2. DONE! Compute the posterior distribution conditionally on the actual data at hand
  3. Use the posterior distribution to make prediction/estimate uncertainty

How to use the posterior distributions

Samples are output in tidy format: for example samples for

random Matrix m ?: latentMatrix(2,2)

will produce a file results/latest/samples/m.csv that looks like this:

sample,row,col,value
0,0,0,-0.12892042622729627
0,0,1,-1.576368198773273
0,1,0,0.4712781753321984
0,1,1,1.104505677917305
1,0,0,1.1399400671309814
1,0,1,-0.7598126921692785
1,1,0,0.5586817875162722
1,1,1,1.0949660195131257
...

Useful things that can be automatically created at the end of a run

Click on the file configuration.txt (left panel): you will find

--postProcessor DefaultPostProcessor

Postprocessors are responsible for summarizing the posterior distributions. They can be customized, but the default one will create:

https://silico.io/abouchard/assignment/template/blang-workshop-nov-2019/q1/coins/latest

Use the posterior: DONE!

  1. DONE! Construct a probability model including
  2. DONE! Compute the posterior distribution conditionally on the actual data at hand
  3. DONE! Use the posterior distribution to make prediction/estimate uncertainty

Pedigree example: composing models and inference over combinatorial spaces

drawing

drawing drawing drawing

Blang code for Mendelian inheritance (Q3/Child.bl): https://silico.io/abouchard/assignment/template/blang-workshop-nov-2019/q3/pedigrees/latest

Blang code for a pedigree (Q3/Family.bl): https://silico.io/abouchard/assignment/template/blang-workshop-nov-2019/q3/pedigrees/latest

drawing

Exercise:

Solution

model Recessive {
  param IntVar numberOfVariantAlleles
  random IntVar traitPresent
  
  laws {
    traitPresent | numberOfVariantAlleles ~ Bernoulli(if (numberOfVariantAlleles == 2) 1.0 else 0.0)
  }
}

Note: deterministic relations such as the one above cause problems in other PPLs such as JAGS/WinBUGS, Blang handle these using a novel tempering strategy (annealing both the temperature and the support simultaneously).

Advanced example: distributions as parameters

See Q4/CensoredExchangeableCount.bl:

Let us demonstrate how we can do Bayesian model selection using this model.

First, notice I used a different inference engine: in configuration.txt:

--engine SCM
--engine.nParticles 20

Warning: more particles should be used in practice

This means instead of using adaptive non-reversible parallel tempering (PT, the default), we use an adaptive Sequential Change of Measure algorithm (better at computing normalization constant \(Z\)).

The estimate for \(Z\) is in results/latest/logNormEstimate.csv and is also shown in the console (Log normalization constant estimate: -4668.584992824664)

Exercise:

Solution

model Crispr {

  random CountFrequencies frequencies ?: CountFrequencies::parse("83x1;70x1;66x1;55x1;51x1;48x1;47x1;44x1;42x1;39x1;37x1;31x2;30x2;27x1;26x2;25x5;24x3;23x4;22x2;21x2;20x3;18x2;17x2;16x4;15x5;14x7;13x4;12x9;11x11;10x12;9x12;8x10;7x15;6x16;5x31;4x39;3x54;2x109;1x725")
  
  random RealVar 
    r ?: latentReal,
    p ?: latentReal, 
    lambda ?: latentReal
    
  laws { 
    p ~ Exponential(0.1)
    r ~ Exponential(0.1)
    lambda ~ Exponential(0.1) 
    frequencies | lambda, r, p
      ~ CensoredExchangeableCounts(
          NegativeBinomial::distribution(r, p), 
          lambda
      )
  }
} 

Gives logZ = -2281.094793017699, so this is a MUCH better model!

Note: no need for parameter counting!

Reproducibility note: running code twice will produce the same result even in multi-threading model (random seeds can be changed with --engine.random 1234).

Hierarchical example: getting tidy data into Blang and computing Bayes factors

Motivation:

Key idea: use “side data”" to inform the prior

data <- read.csv("failure_counts.csv")
data %>% 
  head() %>% 
  knitr::kable(floating.environment="sidewaystable")
LV.Type numberOfLaunches numberOfFailures
Aerobee 1 0
Angara A5 1 0
Antares 110 2 0
Antares 120 2 0
Antares 130 1 1
Antares 230 1 0

How to (badly) use side data

\[\begin{align} p &\sim \text{Beta}(1, 1) \\ X_n | p &\sim \text{Bernoulli}(p); n \in \{1, 2, 3\} \end{align}\]

nHeads | nTrials, p ~ Binomial(nTrials, p)

drawing

By the way, to restrict to only Ariane 1: I used

...
for (Index<String> rocketType : rocketTypes.indices.filter[key == "Ariane 1"]) {... } 
...

drawing drawing

Towards an improved use of side data

An improved way use of side data (not quite full Bayesian yet!)

counts <- read.csv("failure_counts.csv")
ggplot(counts, aes(x = numberOfFailures / numberOfLaunches)) +
  geom_histogram()

Solution: go fully Bayesian

Recall:

  1. Construct a probability model including
    • random variables for what we will measure/observe
    • random variables for the unknown quantities
      • those we are interested in (“parameters”, “predictions”)
      • others that just help us formulate the problem (“nuisance”, “random effects”).
  2. Compute the posterior distribution conditionally on the actual data at hand
  3. Use the posterior distribution to make prediction, estimate uncertainty, make a decision, etc

drawing

In our case: just make \(\mu\) and \(s\) random! (or equivalently, \(\alpha\) and \(\beta\))

Exercise

In: Q5:

Solution

model Hierarchical {

  param GlobalDataSource data
  param Plate<String> rocketTypes
  param Plated<IntVar> numberOfLaunches
  random Plated<RealVar> failureProbabilities
  random Plated<IntVar> numberOfFailures
  random RealVar m ?: latentReal, s ?: latentReal
  
  laws {
    m ~ ContinuousUniform(0, 1)
    s ~ Exponential(0.1)
    for (Index<String> rocketType : rocketTypes.indices) {
      failureProbabilities.get(rocketType) | m, s ~ Beta(m*s, (1.0-m)*s)
      numberOfFailures.get(rocketType)
        | RealVar failureProbability = failureProbabilities.get(rocketType),
          IntVar numberOfLaunch = numberOfLaunches.get(rocketType)
        ~ Binomial(numberOfLaunch, failureProbability)
    }
  }
  
}

Summary when there is no sharing:

data <- read.csv("figs/failureProbabilities-summary-one.csv")
data %>% 
  head() %>%
  knitr::kable(floating.environment="sidewaystable")
X lv.type mean sd min median max
1 Ariane 1 0.2322357 0.1168141 0.0150489 0.2152880 0.6411600
2 Ariane 2 0.2572539 0.1460565 0.0055072 0.2395823 0.7926601
3 Ariane 3 0.1558292 0.0965917 0.0044491 0.1351953 0.6279660
4 Ariane 40 0.1078332 0.0980382 0.0001226 0.0793672 0.5976650
5 Ariane 42L 0.0621791 0.0560808 0.0001023 0.0439521 0.3481125
6 Ariane 42P 0.1181305 0.0771131 0.0021443 0.1029950 0.4222125

Summary with hierarchical sharing:

data <- read.csv("figs/failureProbabilities-summary-hier.csv")
data %>% 
  head() %>%
  knitr::kable(floating.environment="sidewaystable")
X lv.type mean sd min median max
1 Ariane 1 0.1169992 0.0692933 0.0064677 0.1036391 0.4533803
2 Ariane 2 0.0998560 0.0749805 0.0003683 0.0816964 0.4872174
3 Ariane 3 0.0799940 0.0560602 0.0018218 0.0661795 0.4222133
4 Ariane 40 0.0471762 0.0471762 0.0000000 0.0344085 0.3723142
5 Ariane 42L 0.0371144 0.0380083 0.0000001 0.0263615 0.3403815
6 Ariane 42P 0.0644173 0.0438645 0.0010904 0.0549336 0.2955533

drawing drawing

New higher-level hyperparameters = new problems? No we are probably ok!

It seems we have introduced new problems as now we again have hyperparameters, namely those for the priors on \(\mu\) and \(s\). Here we picked \(\mu \sim {\text{Beta}}(1,1) = {\text{Unif}}(0, 1)\), \(s \sim \text{Exponential}(1/10000)\)

Key point: yes, but now we are less sensitive to these choices!

Why? Heuristic: say you have a random variable connected to some hyper-parameters (grey squares) and random variables connected to data (circles)

Before going hierarchical: for maiden/early flights we had

drawing

After going hierarchical:

drawing

Example: permutations and user defined data types

Unidentifiable example: how Blang works under the hood

drawing

Unidentifiability

When there are too many parameters, you may not be able to recover them even if you had “infinite data”

Modifications to the coin/rocket example

\[\begin{align} p &\sim {\text{Unif}}(0, 1) \\ X_n | p &\sim \text{Bernoulli}(p); n \in \{1, 2, 3\} \end{align}\]

nHeads | nTrials, p ~ Binomial(nTrials, p)
model Unidentifiable {
  ...
  laws {
    p1 ~ ContinuousUniform(0.0, 1.0)
    p2 ~ ContinuousUniform(0.0, 1.0)
    nHeads | nTrials, p1, p2 ~ Binomial(nTrials, p1 * p2)
  }
}

drawing

Very brief overview of Blang’s posterior inference engine

Annealing

drawing

Combination of two ideas: “annealing” and “parallel chains”

\[\pi_{\beta}(x) = (\pi(x))^{\beta}\]

where \(\beta\) controls an interpolation between:

Let us call \(\beta\) the annealing parameter

Now we have several different “landscapes”, from smooth to rough. We will explore all of them simultaneously!

Swaps:

drawing

Non-reversible parallel tempering

Parallel Tempering: Reversible (top) versus non-reversible (bottom)

drawing